Development and Evaluation of a Korean Treebank and its Application to NLP
نویسندگان
چکیده
This paper discusses issues in building a 54-thousand-word Korean Treebank using a phrase structure annotation, along with developing annotation guidelines based on the morpho-syntactic phenomena represented in the corpus. Various methods that were employed for quality control are presented. The evaluation on the quality of the Treebank and some of the NLP applications under development using the Treebank are also presented.
منابع مشابه
تصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور
The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...
متن کاملPenn Korean Treebank : Development and Evaluation
With growing interest in Korean language processing, numerous natural languages processing (NLP) tools for Korean, such as part-of-speech (POs) taggers, morphological analyzers , parsers, have been developed. This progress was possible through the availability of large-scale raw text corpora and POS tagged corpora (ETRI, 1999; Yoon and Choi, 1999a; Yoon and Choi, 1999b). However, no large-scale...
متن کاملThe Tibidabo Treebank El treebank Tibidabo
This paper describes work in progress for the creation of a new open– source resource for Spanish: an HPSG–based treebank so–called Tibidabo. The annotation is performed semi–automatically. First, the corpus is automatically annotated by a symbolic HPSG–based grammar for Spanish implemented on the Linguistic Knowledge Builder system; then, the output is manually disambiguated. The existence of ...
متن کاملA Hidden Contributor to the Korean Miracle: The Korean Credit :union: Movement
Korean credit :::union:::s (CUs) are considered to be a hidden contributor to the “Korean miracle”, characterized by remarkable economic growth and relatively low income inequality. The Korean miracle not only generated wealth in an economically strapped and socially under-privileged people, but also contributed to regional community development and the democratization of Korean society. In...
متن کاملA Treebank of Spanish and its Application to Parsing
This paper presents joint research between a Spanish team and an American one on the development and exploitation of a Spanish treebank. Such treebanks for other languages have proven valuable for the development of high-quality parsers and for a wide variety of language studies. However, when the project started, at the end of 1997, there was no syntactically annotated corpus for Spanish. This...
متن کامل